Zero-knowledge proof

In cryptography, a zero-knowledge proof or zero-knowledge protocol is an interactive method for one party to prove to another that a (usually mathematical) statement is true, without revealing anything other than the veracity of the statement.

1 Abstract example
2 Definition
3 Practical example
4 Variants of zero-knowledge
5 Applications
6 History and results
7 See also
8 Notes
9 External links

Abstract example

Peggy randomly takes either path A or B, while Victor waits outside

Victor chooses an exit path

Peggy reliably appears at the exit Victor names

There is a well-known story presenting some of the ideas of zero-knowledge proofs, first published by Jean-Jacques Quisquater and others in their paper "How to Explain Zero-Knowledge Protocols to Your Children".^[1] It is common practice to label the two parties in a zero-knowledge proof as Peggy (the prover of the statement) and Victor (the verifier of the statement).

In this story, Peggy has uncovered the secret word used to open a magic door in a cave. The cave is shaped like a circle, with the entrance on one side and the magic door blocking the opposite side. Victor says he'll pay her for the secret, but not until he's sure that she really knows it. Peggy says she'll tell him the secret, but not until she receives the money. They devise a scheme by which Peggy can prove that she knows the word without telling it to Victor.

First, Victor waits outside the cave as Peggy goes in. They label the left and right paths from the entrance A and B. Peggy randomly takes either path A or B. Then, Victor enters the cave and shouts the name of the path he wants her to use to return, either A or B, chosen at random. Providing she really does know the magic word, this is easy: she opens the door, if necessary, and returns along the desired path. Note that Victor does not know which path she has gone down.

However, suppose she did not know the word. Then, she would only be able to return by the named path if Victor were to give the name of the same path that she had entered by. Since Victor would choose A or B at random, she would have a 50% chance of guessing correctly. If they were to repeat this trick many times, say 20 times in a row, her chance of successfully anticipating all of Victor's requests would become vanishingly small.

Thus, if Peggy reliably appears at the exit Victor names, he can conclude that she is very likely to know the secret word.

Definition

A zero-knowledge proof must satisfy three properties:

Completeness: if the statement is true, the honest verifier (that is, one following the protocol properly) will be convinced of this fact by an honest prover.
Soundness: if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability.
Zero-knowledge: if the statement is true, no cheating verifier learns anything other than this fact. This is formalized by showing that every cheating verifier has some simulator that, given only the statement to be proven (and no access to the prover), can produce a transcript that "looks like" an interaction between the honest prover and the cheating verifier.

The first two of these are properties of more general interactive proof systems. The third is what makes the proof zero-knowledge.

Zero-knowledge proofs are not proofs in the mathematical sense of the term because there is some small probability, the soundness error, that a cheating prover will be able to convince the verifier of a false statement. In other words, they are probabilistic rather than deterministic. However, there are techniques to decrease the soundness error to negligibly small values.

A formal definition of zero-knowledge has to use some computational model, the most common one being that of a Turing machine. Let $P$ , $V$ , and $S$ be turing machines. An interactive proof system with $(P,V)$ for a language $L$ is zero-knowledge if for any probabilistic polynomial time (PPT) verifier $\hat V$ there exists an expected PPT simulator $S$ such that
$\forall x \in L, z \in \{0, 1\}^{*}, View_{\hat V} [P(x) \leftrightarrow \hat V(x, z)] = S(x, z)$

The prover $P$ is modeled as having unlimited computation power (in practice, P usually is a Probabilistic Turing machine). Intuitively, the definition states that an interactive proof system $(P,V)$ is zero- knowledge if for any verifier $\hat V$ there exists an efficient simulator $S$ that can reproduce the conversation between $P$ and $\hat V$ on any given input. The auxiliary string $z$ in the definition plays the role of “prior knowledge”. The definition implies that $\hat V$ cannot use any prior knowledge string $z$ to mine information out of its conversation with $P$ because we demand that if $S$ is also given this prior knowledge then it can reproduce the conversation between $\hat V$ and $P$ just as before.

The definition given is that of perfect zero-knowledge. Computational zero-knowledge is obtained by requiring that the views of the verifier $\hat V$ and the simulator are only computationally indistinguishable, given the auxiliary string.

Practical example

We can extend these ideas to a more realistic cryptography application. In this scenario, Peggy knows a Hamiltonian cycle for a large graph, G. Victor knows G but not the cycle (e.g., Peggy has generated G and revealed it to him.) Peggy will prove that she knows the cycle without revealing it. A Hamiltonian cycle in a graph is just one way to implement a zero knowledge proof; in fact any NP-complete problem can be used, as well as some other difficult problems such as factoring.^[2] However, Peggy does not want to simply reveal the Hamiltonian cycle or any other information to Victor; she wishes to keep the cycle secret (perhaps Victor is interested in buying it but wants verification first, or maybe Peggy is the only one who knows this information and is proving her identity to Victor).

To show that Peggy knows this Hamiltonian cycle, she and Victor play several rounds of a game.

At the beginning of each round, Peggy creates H, an isomorphic graph to G (i.e. H is just like G except that all the vertices have different names). Since it is trivial to translate a Hamiltonian cycle between isomorphic graphs with known isomorphism, if Peggy knows a Hamiltonian cycle for G she also must know one for H.
Peggy commits to H. She could do so by using a cryptographic commitment scheme. Alternatively, she could number the vertices of H, then for each edge of H write a small piece of paper containing the two vertices of the edge and then put these pieces of paper upside down on a table. The purpose of this commitment is that Peggy is not able to change H while at the same time Victor has no information about H.
Victor then randomly chooses one of two questions to ask Peggy. He can either ask her to show the isomorphism between H and G (see graph isomorphism problem), or he can ask her to show a Hamiltonian cycle in H.
If Peggy is asked to show that the two graphs are isomorphic, she first uncovers all of H (e.g. by turning all pieces of papers that she put on the table) and then provides the vertex translations that map G to H. Victor can verify that they are indeed isomorphic.
If Peggy is asked to prove that she knows a Hamiltonian cycle in H, she translates her Hamiltonian cycle in G onto H and only uncovers the edges on the Hamiltonian cycle. This is enough for Victor to check that H does indeed contain a Hamiltonian cycle.

Completeness

If Peggy is honest, she can easily satisfy Victor's demand for either a graph isomorphism (which she has) or a Hamiltonian cycle (which she can construct by applying the isomorphism to the cycle in G).

Zero-Knowledge

Peggy's answers do not reveal the original Hamiltonian cycle in G. Each round, Victor will learn only H's isomorphism to G or a Hamiltonian cycle in H. He would need both answers for a single H to discover the cycle in G, so the information remains unknown as long as Peggy can generate a unique H every round. If Peggy does not know of a Hamiltonian Cycle in G, but somehow knew in advance what Victor would ask to see each round then she could cheat. For example, if Peggy knew ahead of time that Victor would ask to see the Hamiltonian Cycle in H then she could generate a Hamiltonian cycle for an unrelated graph. Similarly, if Peggy knew in advance that Victor would ask to see the isomorphism then she could simply generate an isomorphic graph H (in which she also does not know a Hamiltonian Cycle). Victor could simulate the protocol by himself (without Peggy) because he knows what he will ask to see. Therefore, Victor gains no information about the Hamiltonian cycle in G from the information revealed in each round.

Soundness

If Peggy does not know the information, she can guess which question Victor will ask and generate either a graph isomorphic to G or a Hamiltonian cycle for an unrelated graph, but since she does not know a Hamiltonian cycle for G she cannot do both. With this guesswork, her chance of fooling Victor is 2⁻ⁿ, where n is the number of rounds. For all realistic purposes, it is infeasibly difficult to defeat a zero knowledge proof with a reasonable number of rounds in this way.

Variants of zero-knowledge

Different variants of zero-knowledge can be defined by formalizing the intuitive concept of what is meant by the output of the simulator "looking like" the execution of the real proof protocol in the following ways:

We speak of perfect zero-knowledge if the distributions produced by the simulator and the proof protocol are distributed exactly the same. This is for instance the case in the first example above.

Statistical zero-knowledge means that the distributions are not necessarily exactly the same, but they are statistically close, meaning that their statistical difference is a negligible function.

We speak of computational zero-knowledge if no efficient algorithm can distinguish the two distributions.

Applications

Research in zero-knowledge proofs has been motivated by authentication systems where one party wants to prove its identity to a second party via some secret information (such as a password) but doesn't want the second party to learn anything about this secret. This is called a "zero-knowledge proof of knowledge". However, a password is typically too small or insufficiently random to be used in many schemes for zero-knowledge proofs of knowledge. A zero-knowledge password proof is a special kind of zero-knowledge proof of knowledge that addresses the limited size of passwords.

One of the most fascinating uses of zero-knowledge proofs within cryptographic protocols is to enforce honest behavior while maintaining privacy. Roughly, the idea is to force a user to prove, using a zero-knowledge proof, that its behavior is correct according to the protocol. Because of soundness, we know that the user must really act honestly in order to be able to provide a valid proof. Because of zero knowledge, we know that the user does not compromise the privacy of its secrets in the process of providing the proof. This application of zero-knowledge proofs was first used in the ground-breaking paper of Goldreich, Micali, and Wigderson on secure multiparty computation.

History and results

Zero-knowledge proofs were first conceived in 1985 by Shafi Goldwasser, Silvio Micali, and Charles Rackoff in a draft of "The Knowledge Complexity of Interactive Proof-Systems".^[3] While this landmark paper did not invent interactive proof systems, it did invent the IP hierarchy of interactive proof systems (see interactive proof system) and conceived the concept of knowledge complexity, a measurement of the amount of knowledge about the proof transferred from the prover to the verifier. They also gave the first zero-knowledge proof for a concrete problem, that of deciding quadratic nonresidues mod m. In their own words:

Of particular interest is the case where this additional knowledge is essentially 0 and we show that [it] is possible to interactively prove that a number is quadratic non residue mod m releasing 0 additional knowledge. This is surprising as no efficient algorithm for deciding quadratic residuosity mod m is known when m’s factorization is not given. Moreover, all known NP proofs for this problem exhibit the prime factorization of m. This indicates that adding interaction to the proving process, may decrease the amount of knowledge that must be communicated in order to prove a theorem.

The quadratic nonresidue problem has both an NP and a co-NP algorithm, and so lies in the intersection of NP and co-NP. This was also true of several other problems for which zero-knowledge proofs were subsequently discovered, such as an unpublished proof system by Oded Goldreich verifying that a two-prime modulus is not a Blum integer.^[4]

Oded Goldreich, et al., took this one step further, showing that, assuming the existence of unbreakable encryption, one can create a zero-knowledge proof system for the NP-complete graph coloring problem with three colors. Since every problem in NP can be efficiently reduced to this problem, this means that, under this assumption, all problems in NP have zero-knowledge proofs.^[5] The reason for the assumption is that, as in the above example, their protocols require encryption. A commonly cited sufficient condition for the existence of unbreakable encryption is the existence of one-way functions, but it is conceivable that some physical means might also achieve it.

On top of this, they also showed that the graph nonisomorphism problem, the complement of the graph isomorphism problem, has a zero-knowledge proof. This problem is in co-NP, but is not currently known to be in either NP or any practical class. More generally, Goldreich, Goldwasser et al. would go on to show that, also assuming unbreakable encryption, there are zero-knowledge proofs for all problems in IP=PSPACE, or in other words, anything that can be proved by an interactive proof system can be proved with zero knowledge.^[6]

Not liking to make unnecessary assumptions, many theorists sought a way to eliminate the necessity of one way functions. One way this was done was with multi-prover interactive proof systems (see interactive proof system), which have multiple independent provers instead of only one, allowing the verifier to "cross-examine" the provers in isolation to avoid being misled. It can be shown that, without any intractability assumptions, all languages in NP have zero-knowledge proofs in such a system.^[7]

It turns out that in an Internet-like setting, where multiple protocols may be executed concurrently, building zero-knowledge proofs is more challenging. The line of research investigating concurrent zero-knowledge proofs was initiated by the work of Dwork, Naor, and Sahai.^[8] One particular development along these lines has been the development of witness-indistinguishable proof protocols. The property of witness-indistinguishability is related to that of zero-knowledge, yet witness-indistinguishable protocols do not suffer from the same problems of concurrent execution.^[9]

Another variant of zero-knowledge proofs are non-interactive zero-knowledge proofs. Blum, Feldman, and Micali showed that a common random string shared between the prover and the verifier is enough to achieve computational zero-knowledge without requiring interaction.^[10]

Notes

^ Quisquater, Jean-Jacques; Guillou, Louis C.; Berson, Thomas A. (1990). "How to Explain Zero-Knowledge Protocols to Your Children". Advances in Cryptology - CRYPTO '89: Proceedings 435: 628–631. http://www.cs.wisc.edu/~mkowalcz/628.pdf.
^ http://www.bennetyee.org/ucsd-pages/ZKP.html
^ Goldwasser, S.; Micali, S.; Rackoff, C. (1989), "The knowledge complexity of interactive proof systems", SIAM Journal on Computing (Philadelphia: Society for Industrial and Applied Mathematics) 18 (1): 186–208, ISSN 1095-7111, http://crypto.cs.mcgill.ca/~crepeau/COMP647/2007/TOPIC02/GMR89.pdf
^ Goldreich, Oded (1985). "A zero-knowledge proof that a two-prime moduli is not a Blum integer". Unpublished manuscript.
^ Goldreich, Oded; Micali, Silvio; Wigderson, Avi (1991). "Proofs that yield nothing but their validity". Journal of the ACM 38 (3): 690–728. doi:10.1145/116825.116852.
^ Ben-Or, Michael; Goldreich, Oded; Goldwasser, Shafi; Hastad, Johan; Kilian, Joe; Micali, Silvio; Rogaway, Phillip (1990). "Everything provable is provable in zero-knowledge". In Goldwasser, S.. Advances in Cryptology--CRYPTO '88. Lecture Notes in Computer Science. 403. Springer-Verlag. pp. 37–56.
^ Ben-or, M.; Goldwasser, Shafi; Kilian, J.; Wigderson, A. (1988). "Multi prover interactive proofs: How to remove intractability assumptions". Proceedings of the 20th ACM Symposium on Theory of Computing: 113–121. http://theory.lcs.mit.edu/~cis/pubs/shafi/1988-stoc-bgkw.pdf.
^ Dwork, Cynthia; Naor, Moni; Sahai, Amit (2004). "Concurrent Zero Knowledge". Journal of the ACM 51 (6): 851–898. doi:10.1145/1039488.1039489.
^ Feige, Uriel; Shamir, Adi (1990). "Witness Indistinguishable and Witness Hiding Protocols". Proceedings of the twenty-second annual ACM Symposium on Theory of Computing (STOC). doi:10.1145/100216.100272.
^ Blum, Manuel; Feldman, Paul; Micali, Silvio (1988). "Non-Interactive Zero-Knowledge and Its Applications". Proceedings of the twentieth annual ACM symposium on Theory of computing (STOC 1988): 103–112. doi:10.1145/62212.62222.

External links

Applied Kid Cryptography – A simple explanation of zero-knowledge proofs using Where's Waldo? as an example
Video link A video example of the cave thought experiment
A gentle introduction to zero-knowledge proofs with applications to cryptography
How to construct zero-knowledge proof systems for NP
An efficient non-interactive statistical zero-knowledge proof system for quasi-safe prime products
A tutorial by Oded Goldreich on zero knowledge proofs
Salil Vadhan, Vadhan's phd thesis on statistical zero knowledge
Theory of Computing Course, Cornell University 2009, Zero knowledge proofs